Modelling of multivariate densities is a core component in many signalprocessing, pattern recognition and machine learning applications. Themodelling is often done via Gaussian mixture models (GMMs), which usecomputationally expensive and potentially unstable training algorithms. Weprovide an overview of a fast and robust implementation of GMMs in the C++language, employing multi-threaded versions of the Expectation Maximisation(EM) and k-means training algorithms. Multi-threading is achieved throughreformulation of the EM and k-means algorithms into a MapReduce-like framework.Furthermore, the implementation uses several techniques to improve numericalstability and modelling accuracy. We demonstrate that the multi-threadedimplementation achieves a speedup of an order of magnitude on a recent 16 coremachine, and that it can achieve higher modelling accuracy than a previouslywell-established publically accessible implementation. The multi-threadedimplementation is included as a user-friendly class in recent releases of theopen source Armadillo C++ linear algebra library. The library is provided underthe permissive Apache~2.0 license, allowing unencumbered use in commercialproducts.
展开▼
机译:多维密度建模是许多信号处理,模式识别和机器学习应用程序中的核心组件。建模通常是通过高斯混合模型(GMM)进行的,该模型使用计算上昂贵且可能不稳定的训练算法。我们提供了对C ++语言中GMM的快速而强大的实现的概述,它使用了期望最大化(EM)和k-means训练算法的多线程版本。通过将EM和k-means算法重构为类似MapReduce的框架来实现多线程。此外,该实现使用多种技术来提高数值稳定性和建模精度。我们证明了多线程实现在最近的16核计算机上实现了一个数量级的加速,并且比以前建立的可公开访问的实现可以实现更高的建模精度。在最近发布的开源Armadillo C ++线性代数库中,多线程实现作为用户友好类包含在内。该库是在许可的Apache〜2.0许可下提供的,允许在商业产品中不受限制地使用。
展开▼